Ancestral Nucleotide and Amino Acid Sequences
نویسندگان
چکیده
A statistical method was developed for reconstructing the nucleotide or amino acid sequences of extinct ancestors, given the phylogeny and sequences of the extant species. A model of nucleotide or amino acid substitution was employed to analyze data of the present-day sequences, and maximum likelihood estimates of parameters such as branch lengths were used to compare the posterior probabilities of assignments of character states (nucleotides or amino acids) to interior nodes of the tree; the assignment having the highest probability was the best reconstruction at the site. The lysozyme c sequences of six mammals were analyzed by using the likelihood and parsimony methods. The new likelihood-based method was found to be superior to the parsimony method. The probability that the amino acids for all interior nodes at a site reconstructed by the new method are correct was calculated to be 0.91, 0.86, and 0.73 for all, variable, and parsimony-informative sites, respectively, whereas the corresponding probabilities for the parsimony method were 0.84, 0.76, and 0.51, respectively. The probability that an amino acid in an ancestral sequence is correctly reconstructed by the likelihood analysis ranged from 91.3 to 98.7% for the four ancestral sequences. I T was suggested many years ago that amino acid sequences of present-day species may be used to reconstruct sequences of their extinct ancestors (e.g., PAULING and ZUCKERKANDI, 1963; ECK and DAYHOFF 1966: 161-202). The usefulness of reconstruction of ancestral sequences has been well recognized by evolutionary biologists (PAULING and ZUCKERKANDL 1963; MADDISON and MADDISON 1992; SWOFFORD 1993; LIBERTINI and DI DONATO 1994; STEWART 1995). For example, MALCOLM et al. (1990), ADEY et al. (1994), STACKHOUSE et al. (1994), andJERMANN et al. (1995) used the parsimony approach to infer amino acid sequences of extinct ancestral species and synthesized the genes in the laboratory by site-directed mutagenesis, and produced the gene products (proteins) in bacterial or cultured cells. The physico-chemical properties and physiological functions of these molecules were then studied, with a number of interesting findings (see STEWART 1995 for a review). Reconstruction of ancestral sequences also makes it possible to infer the evolutionary pathway of nucleotide or amino acid substitution at each site of the sequence, and this is useful for identifylng specific nucleotide or amino acid changes that caused a functional change of the gene and for detecting convergent evolution or positive Darwinian selection at the nucleotide or amino acid level (e.g., STEWART et al. 1987; SWANSON et al. 1991). In the inference of ancestral sequences, the method of maximum parsimony has been used almost excluCorresponding authm: Ziheng Yang, Department of Integrative Biology, University of California, Berkeley, CA 94720-3140. E-mail: [email protected] Genetics 141: 1641-16.50 (Decrmhel-, 199.5) sively (see the references mentioned above). The method assigns character states (nucleotides or amino acids) to the interior nodes of the tree such that the number of character-state changes along the tree at each site is minimized. Algorithms for reconstructing ancestral character states under this criterion have been developed by FITCH (1971) for rooted bifurcating trees and by HARTIGAN (1973) for general tree topologies (see also EcKand DAYHOFF 1966; MADDISON and MADDISON 1992; SWOFFORD 1993). However, the accuracy of the reconstruction is usually unknown, except for the fact that the reconstruction will be reliable if the sequences are closely related. As parsimony generally fails to take into account biased substitution rates between nucleotides or amino acids and different branch lengths in the tree, there is concern about the reliability of the parsimony reconstruction ( . g . , COLLINS et al. 1994). Furthermore, parsimony often suggests many equally-best reconstructions at a site, and there is no natural way of choosing one of them. In stochastic models used in the maximum-likelihood method of phylogenetic analysis, character states in ancestral sequences are regarded as random variables (e.g., FELSENSTEIN 1981; GOLDMAN 1990). They do not appear in the likelihood function and are normally not estimated. However, the major reason for the lack of a probabilistic approach to character reconstruction seems to be the perception that there are a great many possible reconstructions at a site so that choosing one of them would be unlikely to be correct. For example, an (unrooted) tree of 10 amino acid sequences has eight interior nodes, so that at each site there are 20' 1642 Z. Yang, S. Kumar and M. Nei = 2.56 X 10" possible assignments of amino acids to the interior nodes. A method that assigns amino acids to the interior nodes at random would have a probability 0.39 X 10"' of being correct. At any rate, knowledge of the ancestral sequences is of great biological importance, and it is worth knowing how accurate the reconstruction can be and what factors are important in influencing the accuracy of reconstruction with real data sets. In this paper we propose a model-based likelihood approach to reconstructing ancestral sequences. The method follows standard statistical theory: given the data at the site, the conditional probabilities of different reconstructions can be compared and the reconstruction having the highest conditional probability is the best estimate at the site. The method allows calculation of the probability that the reconstruction at a site is correct, which provides a natural measure of the accuracy of the reconstruction. Real data will be analyzed to evaluate the accuracy of both the new method of this paper and the parsimony method and to identify factors accounting for the differences between the two methods. The robustness of character reconstruction to the assumed substitution model will also be examined through analysis of the
منابع مشابه
Nucleotide and Amino Acid Changes in HN, F and SH genes of an Iranian Mumps Virus; RS-12, Following Attenuation to Vaccine Strain
Background and Aims: Wild-type RS-12 strain of mumps virus has been isolated from an Iranian patient and has been attenuated after several serial passages. This study was designed to determine nucleotide and amino acid substitutions in the HN, F and SH genes during attenuation of the wild-type virus. Materials and Methods: Required viral samples prepared at Razi Vaccine and Serum Institute. Vi...
متن کاملNucleotide sequence of cDNA encoding for preprochymosin in native goat (Capra hircus) from Iran
Prochymosin is one of the most important aspartic proteinases used as a milk-clotting enzyme in cheese production. In the present investigation we report sequence of cDNA encoding goat ( Capra hircus ) preprochymosin and compare its nucleotide and deduced amino acid sequences with sequences of other ruminants preprochymosin. As bovine prochymosin, the caprine prochymosin cDNA encodes 365 amino ...
متن کاملPhylogenetic and sequence analysis of the growth hormone gene of two sturgeons, Huso huso and Acipenser Gueldenstaedtii
In this study, the cDNA Growth Hormone (cGH) of the Belugasturgeon (Husohuso) and Russian sturgeon (Acipensergueldenstaedtii) were cloned and sequenced, and phylogenetic relationships were examined using nucleic acid and amino acid sequences. The nucleotide sequence of the Beluga GH has an open reading frame of 645 nucleotides encoding a protein 214 amino acid residues. The signal peptide cleav...
متن کاملNucleotide mutation analyses of isolated lentogenic newcastle disease virus in live bird market
Newcastle Disease (ND) is a major viral disease in Indonesia. It is an RNA virus belongs to Paramyxovirinae. It is well known that RNA virus is easily to mutate. In some cases, this mutation could generate virulence alteration. It is noted that mutation of NDV which has avirulent amino acid sequence on the cleavage site, could mutate to be virulent Newcastle Disease Virus (NDV). It is needed to...
متن کاملEvolutionary features of 8K (KDa) silencing suppressor protein of Potato mop-top virus
The cysteine-rich 8K protein of Potato mop-top virus (PMTV) suppresses host RNA silencing. In this study, evolutionary analysisof 8K sequences of PMTV isolates was studied on the basis of nucleotide and amino acid sequences. Twenty-one positively selected sites were identified in 8K codingregions. Recombination events were found in the 8K of PMTV isolates with a rate of 1.8. Totally 30 haplotyp...
متن کامل